Although the main objective of this work is to use the knn algorithm to get a 3D geological model of the LRD, the process of obtaining the model implies some techniques and knowledge of Python programing language. We are going to start this notebook with a simple introduction to draw 3D figures with the Python library plotly.graph_objects.
We begin by defining the routes and directories where the data and figures will be located. The figures can be saved in different format: jpg, html, etc. but they can also be shown in this notebook.
DATADIR='data/' # Directory with the data
FIGURESDIR='figures/' # Figures produced
We have used Google Earth to draw the delta contour and we have created the file 'deltacontour.csv' with the corresponding data. The data consist in georeferenced points with the UTM_X and UTM_Y coordinates and a fixed height for all points (Cota) that we choose to be 7 meters. We could have taken the real height at each point but this disturbs the image and we prefer to take a flat figure for the delta contour. To handle this kind of data we use the Python library Pandas so we have to import it.
import pandas as pd
And now we can read the 'deltacontour.csv' file.
delta_contour=pd.read_csv(DATADIR+'deltacontour.csv') # delta contour data
delta_contour
| UTM_X | UTM_Y | Cota | |
|---|---|---|---|
| 0 | 417298.700498 | 4.581675e+06 | 7.0 |
| 1 | 417640.559097 | 4.581045e+06 | 7.0 |
| 2 | 417758.997444 | 4.579994e+06 | 7.0 |
| 3 | 417952.233058 | 4.579820e+06 | 7.0 |
| 4 | 418395.999372 | 4.578868e+06 | 7.0 |
| ... | ... | ... | ... |
| 76 | 420770.000000 | 4.581282e+06 | 7.0 |
| 77 | 420101.129561 | 4.581458e+06 | 7.0 |
| 78 | 420032.624113 | 4.581789e+06 | 7.0 |
| 79 | 419642.056620 | 4.582375e+06 | 7.0 |
| 80 | 419154.263400 | 4.582825e+06 | 7.0 |
81 rows × 3 columns
We now import the plotting libraries.
# import plotting libraries
import plotly.offline as go_offline
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import matplotlib as mpl
We have to adapt and extract from the data we have at each moment what is really needed to draw the figure we want. This can be done directly with Python commands but, since we are going to apply the process many times and to many different data files, we have defined the auxiliary function coordinates(data,positions) to help with this task. For example, to create a 3D figure with the the delta contour file we take data=DATADIR+'deltacontourn.cs and positions=[0,1,2, that are the position were the UTM_X, UTM_Y and Cota are. So we have to input coordinates(DATADIR+'deltacontourn.csv',[0,1,2].
All the auxiliary functions that we need are define in the file functions.py that can be found in the repository. We import all auxiliary functions.
# import customed auxiliary functions
import functions
from functions import *
# x, y, z list of coordinates of the points in the delta contour
xyzcontour=coordinates(DATADIR+'deltacontour.csv',[0,1,2])
The variable xyzcontour provides the main data for the figure, but we also need to indicate the figure margins, we do this again with an auxiliary function that we call bound.
bound=bounds(xyzcontour) # bounds for the figures
We are ready to draw the 3D delta contour figure. We can display the figure in this notebook with the figure.show() command or we can save the figure as an HTML file.
fig=go.Figure()
# the contourn data with lines mode
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
# the figure title and others figure layout
fig.update_layout( title="3D Llobregat Delta Contour, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
fig.show()
# we also save the figure in html format
go_offline.plot(fig,filename=FIGURESDIR+'3D Llobregat Delta Contour.html',validate=True, auto_open=False)
'figures/3D Llobregat Delta Contour.html'
Now let's read the data from the boreholes and drop the data that we are not going to use.
# the boreholes data
boreholes_data=pd.read_excel(DATADIR+'hsd new basements.xls')
# drop unused colum
boreholes_data=boreholes_data.drop(columns=['Codi'])
boreholes_data
| UTM_X | UTM_Y | Cota | Valor | Clase | |
|---|---|---|---|---|---|
| 0 | 419880.0 | 4580670.0 | -21 | S | NaN |
| 1 | 419880.0 | 4580670.0 | -20 | 7 | gravillas y gravas |
| 2 | 419880.0 | 4580670.0 | -19 | 7 | gravillas y gravas |
| 3 | 419880.0 | 4580670.0 | -18 | 7 | gravillas y gravas |
| 4 | 419880.0 | 4580670.0 | -17 | 7 | gravillas y gravas |
| ... | ... | ... | ... | ... | ... |
| 33360 | 419850.0 | 4569390.0 | -182 | S | NaN |
| 33361 | 419850.0 | 4569390.0 | -183 | S | NaN |
| 33362 | 419850.0 | 4569390.0 | -184 | S | NaN |
| 33363 | 419850.0 | 4569390.0 | -185 | S | NaN |
| 33364 | 419850.0 | 4569390.0 | -186 | S | NaN |
33365 rows × 5 columns
We are going to include the locations of the boreholes to a new figure, for that we have to prepare the data and we are going to use the function coordinates for this task. But the first input to this function coordinates is a csv file, so we save the boreholes_dat to a csv file and then apply the coordinates function.
# save the borehole data as a csv file
boreholes_data.to_csv(DATADIR+'boreholes.csv',index=False)
# the list of x,y,z coordinates of the boreholes
xyzboreholes=coordinates(DATADIR+'boreholes.csv',[0,1,2])
fig=go.Figure()
# we add marks for each borehole data point
fig.add_trace(go.Scatter3d(x=xyzboreholes[0], y=xyzboreholes[1], z=xyzboreholes[2],
mode ='markers',
name='boreholes',
marker = dict(size = 1,
color ='red',
opacity = 1,
symbol='circle')
))
# the contour data
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
# the layout
fig.update_layout( title="3D Llobregat Delta Contour, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
fig.show()
# we also save the figure in html format
go_offline.plot(fig,filename=FIGURESDIR+'3D Contour and boreholes.html',validate=True, auto_open=False)
'figures/3D Contour and boreholes.html'
Since the data will get more and more complicated, the number of lines to create the figures will increase more and more, so we have defined auxiliary functions to manipulate the data to make the figure input look as easy as possible.
For example the auxiliary function data_p(list,names,colors,symbols,size) manipulates the data in list (which must be a list of lists of xyx coordinates) to produce the necessary input in a figure to draw a determinate symbol (given in the list simbol) in a determinate position (given in lis) with a specific name in the figure leyend and a specific size (given by the variable size)
For example, to produce the same figure 3D Llobregat Delta Contour and boreholes.htm we can also run the following code.
# process the marks data with the auxiliary function data_p
data_boreholes_marks=data_p([xyzboreholes],['boreholes'],['red'],['circle'],1)
# an alternative way to define the same figure
fig=go.Figure(data=data_boreholes_marks)
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
fig.update_layout( title="3D Llobregat Delta Contour and boreholes, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
#fig.show()
go_offline.plot(fig,filename=FIGURESDIR+'3D Contour and boreholes alternative.html',validate=True, auto_open=False)
'figures/3D Contour and boreholes alternative.html'
We want to draw only data inside the delta contour in our figures, to do that first we use the Python Geometry library and its Polygon function to create a polygon with the delta contour and then we use the auxiliary function nearby(xyz_list,polygon,distance) that eliminate from xyz_list all those that are not within a distance given by the variable distance of the polygon polygon.
import shapely.geometry as geometry
# a polygon constructed from the points in the delta contour
contour_poly=geometry.Polygon(zip(delta_contour['UTM_X'],
delta_contour['UTM_Y']))
# we select the points within the polygon or at a distance less than 300 m from it
xyzboreholes_near=nearby(xyzboreholes,contour_poly,300)
With the data close to the contour we remake the figure 3D Llobregat Delta Contour and boreholes.htm taking only the boreholes near the perimeter contour.
# the data for the marks of the points in xyzboreholes_near
data_boreholes_marks_near=data_p([xyzboreholes_near],['boreholes'],['red'],['circle'],1)
fig=go.Figure(data=data_boreholes_marks_near)
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
fig.update_layout( title="3D Llobregat Delta Contour and boreholes near the contourn, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
fig.show()
go_offline.plot(fig,filename=FIGURESDIR+'3D Contour and boreholes near.html',validate=True, auto_open=False)
'figures/3D Contour and boreholes near.html'
The boreholes data is a 3D cloud of points in the LRD, each one with a concrete granulometry. We want to process this data. We start by defining four classes of material depending on the granulometry of the sediments we found at each location, these classes are: clay-silt (< 1 mm), coarse sand (1–5 mm), gravel (> 5mm) and finally the Pliocene of older basement. We also want to classify these materials by horizontal sections or layers separated 5 m one from other (0 to 120 m.b.s.l). We chose the 5 m distance betwenn layers because it is enough representative taking into account the ratio vertical-horizontal (m-km) size of the Quaternary LDR.
We use Pandas commands to do the job but we again define an auxillary function layer_function(data,h) to makes things easier to handle. In this function the variable h indicates the heigth where the layer is located.
The function layer_function(data,h) returns a list of lenght 4. If l_f denotes the output of layer_function(boreholes_data,h),
l_f[0] we select the columns in boreholes_data corresponding to those whose height (variable 'Cota') is equal to h.l_f[1] we select from l_f[0] the points corresponding to gravels (variable 'Clase' equal to 'gravillas').l_f[2] we select from l_f[0] the points corresponding to sands (variable 'Clase' equal to 'arenas').l_f[3] we select from l_f[0] the points corresponding to clays (variable 'Clase' equal to 'arcillas').l_f[4] we select from l_f[0] the points corresponding to basement (variable 'Clase' equal to 'S').For example, if we apply the function layer_function to boreholes_data at height -75 m we get.
# an example, the layer data at -75 m of height
l_f75=layer_function(boreholes_data,-75)
l_f75[0]
| UTM_X | UTM_Y | Cota | Valor | Clase | |
|---|---|---|---|---|---|
| 4394 | 428450.0 | 4577240.0 | -75 | 0.45 | arcillas y limos |
| 4642 | 427270.0 | 4576580.0 | -75 | 0.45 | arcillas y limos |
| 5356 | 427295.0 | 4576555.0 | -75 | 1.15 | arenas |
| 5690 | 429680.0 | 4577140.0 | -75 | 0.16 | arcillas y limos |
| 10344 | 428261.0 | 4574247.0 | -75 | 0.001 | arcillas y limos |
| ... | ... | ... | ... | ... | ... |
| 32109 | 415410.0 | 4571150.0 | -75 | S | NaN |
| 32275 | 416150.0 | 4571840.0 | -75 | S | NaN |
| 32437 | 416350.0 | 4571810.0 | -75 | S | NaN |
| 32857 | 413510.0 | 4568960.0 | -75 | S | NaN |
| 33036 | 413508.0 | 4569413.0 | -75 | S | NaN |
99 rows × 5 columns
l_f75[1]
| UTM_X | UTM_Y | Cota | Valor | Clase | |
|---|---|---|---|---|---|
| 16997 | 429733.0 | 4572853.0 | -75 | 30 | gravillas y gravas |
| 17082 | 429097.0 | 4572047.0 | -75 | 30 | gravillas y gravas |
| 17158 | 429094.0 | 4572092.0 | -75 | 30 | gravillas y gravas |
| 17243 | 429739.0 | 4572845.0 | -75 | 30 | gravillas y gravas |
| 17765 | 417980.0 | 4570110.0 | -75 | 7 | gravillas y gravas |
| 21995 | 419850.0 | 4569390.0 | -75 | 65 | gravillas y gravas |
We have decided to take data every 5 meters of depth. To do this, we define a list with the heights (the Z coordinate) that we delimit between 0 and -120 meters. This list will be used to select from boreholes_data those at each height.
# the list of selected heights
Heights=[-z for z in range(0,121,5)]
Now that we can split the boreholes_dat by layers and granulometry, we want to predict what material can be found at any position inside the delta contour. But this task is too ambitious and we have to reduce our perspectives, what we do is to define a frame in the x and y axis limits in which we will establish a grid (whose number of rows and columns must be chosen in each case) with the points that we want to predict.
# the framework and limits of predictions
FRAME=500
axis=[boreholes_data.UTM_X.min()-FRAME,boreholes_data.UTM_X.max()+FRAME,
boreholes_data.UTM_Y.min()-FRAME,boreholes_data.UTM_Y.max()+FRAME]
minx_rounded = 1000 * round(axis[0]/1000)
maxx_rounded = 1000 * round(axis[1]/1000)
Now we are ready to employ the Python Knn function given by the scikit-learn community.
We define again an auxiliary function to make things easy. This function is knn(boreholes_data,contour_poly,height,axis,grid). Then we apply this function on each lithosome and height on Heights to create the predicted points for each lithosome.
The knn function creates a grid of size grid $\times$ grid limited by the parameters in axis (min X, max X, min Y, max Y). For each point in the grid, it predices the granulometry by looking at the nearest data point (in boreholes_data) in the same layer (determined by height). After that, the prediction of the points outside the polygon contour_poly is set to nan so they are not drawn in the subsequent figures. Finally, a 3-uple is retourned, it contains the following: the first component is a list of lists with the X coordinates of the points in the grid (the same list of X coordinates appear repeated grid times); the second component contains the same for the Y coordinates; the third component is a matrix, where the element at $(i,j)$ contains the prediction for the point $(i,j)$ in the grid, or nan if the point is outside the polygon.
We use the following keys: 0 for gravels, 1 for sands, 2 for clays and 3 for the basement.
Below, an example of this knn function is given for a grid of width 150.
# knn algorith predictions for a grid of width 150
data_split=[[],[],[],[]]
for h in Heights:
cc=knn(boreholes_data,contour_poly,h,axis,150)
for c in range(4):
x=[cc[0][i,j] for i in range(cc[0].shape[0]) for j in range(cc[0].shape[1]) if cc[2][i,j]==c]
y=[cc[1][i,j] for i in range(cc[1].shape[0]) for j in range(cc[1].shape[1]) if cc[2][i,j]==c]
z=[h for i in range(cc[0].shape[0]) for j in range(cc[0].shape[1]) if cc[2][i,j]==c]
data_split[c].append([x,y,z])
gravels_knn=data_split[0] # knn predictions for gravels
sands_knn=data_split[1] # knn predictions for sands
clays_knn=data_split[2] # knn predictions for clays
basement_knn=data_split[3] # knn predictions for basement
The data stored in the above code lines is in the appropriate shape to be used to draw a figure with any lithosome at any layer. For example, with the next lines of code we are going to draw a figure with all lithosomes found at height -75 m (-75 is the height that appear in the list Heigh with index 15).
Heights[15]
-75
# select data at height -75 m
gravels_knn75=data_split[0][15]
sands_knn75=data_split[1][15]
clays_knn75=data_split[2][15]
basement_knn75=data_split[3][15]
# we prepare the data to draw the predicted data at height -75 m
data_gravels_knn75=data_p([gravels_knn75],['gravels height -75'],['skyblue'],['circle'],3)
data_sands_knn75=data_p([sands_knn75],['sands height -75'],['lemonchiffon'],['circle'],3)
data_clays_knn75=data_p([clays_knn75],['clays -75'],['whitesmoke'],['circle'],3)
data_basement_knn75=data_p([basement_knn75],['basement height -75'],['darkgoldenrod'],['circle'],3)
data_layer75=data_gravels_knn75+data_sands_knn75+data_clays_knn75+data_basement_knn75
fig=go.Figure(data=data_layer75)
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
fig.update_layout( title="3D Llobregat Delta knn prediction at height -75 m, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
fig.show()
go_offline.plot(fig,filename=FIGURESDIR+'3D Llobregat Delta knn prediction at height -75 m.html',validate=True, auto_open=False)
'figures/3D Llobregat Delta knn prediction at height -75 m.html'
We can represent all layers in the same figure.
# data for gravel predictions at each height
gravels_names=['gravels height '+str(h) for h in Heights]
gravels_colors=['skyblue' for h in Heights]
gravels_symbols=['circle' for h in Heights]
gravels_data=data_p(gravels_knn,gravels_names,gravels_colors,gravels_symbols,3)
# data for sands predictions at each height
sands_names=['sands height '+str(h) for h in Heights]
sands_colors=['lemonchiffon' for h in Heights]
sands_symbols=['circle' for h in Heights]
sands_data=data_p(sands_knn,sands_names,sands_colors,sands_symbols,3)
# data for clays predictions at each height
clays_names=['clays height '+str(h) for h in Heights]
clays_colors=['whitesmoke' for h in Heights]
clays_symbols=['circle' for h in Heights]
clays_data=data_p(clays_knn,clays_names,clays_colors,clays_symbols,3)
# data for basement predictions at each height
basement_names=['basement height '+str(h) for h in Heights]
basement_colors=['darkgoldenrod' for h in Heights]
basement_symbols=['circle' for h in Heights]
basement_data=data_p(basement_knn,basement_names,basement_colors,basement_symbols,3)
We want to add to the figure the contour shape at each height, so we need to create the contour data for this.
# data for a contour polygon at each height
zcontour=[[h+0.3 for i in range(len(xyzcontour[0]))] for h in Heights]
contourn_data=[go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=a, mode="lines",
line_width=5,
name='_',
showlegend=False,
opacity=0.1,
marker = dict(
size = 2,
opacity=0.1,
color = 'gray'
))
for a in zcontour[1:]]
To maintain the aspect ratio of the HTML image we are also going to add invisible marks at each height in one corner of the image. We don't want to add a name for these marks to the legend so we use the function data_p_nl with is similar to data_p but data_p_n skips the legend.
# a list of invisible marks to maintain the aspect ratio of the figure
marks=[[bound[0]-100 for h in Heights],[bound[2]-100 for h in Heights],Heights]
marks_data=data_p_nl([marks],['marks'],['ghostWhite'],['cross'],1)
We are ready to implement the HTML 3d figure
3D_horizontal_sections_LRD.htm, which will be located in the figures directory.
If this figure weights too much to be handled by your browser you can reduce the size of the Heights
list by taking, for example, values every 10 meters instead of every 5 meters. The reader may tune this parameter according to the computational resources available, the definition desired for the figure and the number of points in the dataset. If you have a sparser data set, you may want to take a layer every meter (or whatever interval between measurements you have) to use the most number possible of data points to form the lithosomes.
fdata=gravels_data+sands_data+clays_data+basement_data+marks_data+contourn_data
fig=go.Figure(data=fdata)
fig.add_trace(go.Scatter3d(x=xyzcontour[0],y=xyzcontour[1],z=xyzcontour[2], mode="lines",
line_width=5,
name='Delta Contour',
marker = dict(
size = 4,
color = 'black'
)))
fig.update_layout( title="3D boreholes Llobregat Delta, Z scale is x 50.",
scene=dict(aspectratio=dict(x=2, y=2, z=0.5),
xaxis = dict(range=[bound[0]-2000,bound[1]+2000],),
yaxis = dict(range=[bound[2]-2000,bound[3]+2000])
))
fig.show()
go_offline.plot(fig,filename=FIGURESDIR+'3D_horizontal_sections_LRD.html',validate=True, auto_open=False)